Filtering
We can filter data observations using the dplyr package’s filter()
function. Usually we filter by a criterion that we can customize and
define.
jan1 = filter(flights, month==1, day==1)
jan1
The above code filters the data such that it returns all flights that
were on January 1. Now how about all flights that occurred in November
or December?
nov_dec = filter(flights, month %in% c(11, 12))
nov_dec
Filtering works faster than dropping rows manually by only returning
the observations that satisfy your conditional requirement. Now to find
all flights that were not delayed by no more than two hours (delay is
measured in minutes here)…
no_delay = filter(flights, arr_delay <= 120, dep_delay <= 120)
no_delay
Arranging
descending_delay = arrange(flights, desc(dep_delay))
descending_delay
Now let’s try ascending order by arrival delay time.
arr_ascend = arrange(flights, -desc(arr_delay))
arr_ascend
Data Retrieval
We can retrieve certain columns of data with the select() function
from dplyr.
select(flights, dep_time, sched_dep_time)
Let’s try retrieving columns features that don’t contain the string
“time” in the name. You can also similarly switch this up with
starts_with() and ends_with().
select(flights, !contains("time"))
We can move certain variables to the front with the select() function
and using the everything() function last.
select(flights, time_hour, air_time, everything())
Variable Creation and Modification
We can create new variables and modify values in them using mutate().
You can use conditional statements and even use other built-in functions
to specify how you want to transform variables. We’ve seen this before
when performing data cleaning last week, and we’ll see it applied later
again.
Let’s create a new variable in nycflights13 where we calculate the
speed of each of the delayed flights. We’ll need to use select()
first.
flights_sml = select(flights,
year:day,
ends_with("delay"),
distance,
air_time)
mutate(flights_sml, speed = distance / air_time * 60)
transmute() can be used to omit all variables except the ones you’ve
created in the tibble/dataframe.
transmute(flights_sml, speed = distance/ air_time * 60)
Grouping and Summarizing
group_by() can be used to group your data by certain columns. na.rm
is the default R parameter that omits NA values if there are any. We can
use this parameter (just like last week) since aggregate functions built
into R will return NA values as output if the input is also NA. Thus
na.rm is a useful parameter when cleaning and transforming your
data.
by_day = group_by(flights, year, month, day)
summarize(by_day, delay = mean(dep_delay, na.rm = TRUE))
So this is what happens when we use mean() and no na.rm
parameter.
by_day = group_by(flights, year, month, day)
summarize(by_day, delay = mean(dep_delay))
One-Hot Encoding: The Binary Side
Last week, we didn’t touch the categorical variables for a reason.
This week, we will use our knowledge of dplyr to transform our data by
employing one-hot encoding. This idea focuses on the premise of creating
a new object and assigning corresponding numerical values to the
original values. This is commonly done in preparation prior to data
analysis and machine learning model implementation. Here, we’ll consider
a categorical variable with two classes, which is a binary variable.
Let’s take a look at the housing.csv dataset (provided in
Canvas).
library(readr)
library(tidyverse)
housing_data = read.csv("C:/Users/coryg/Downloads/Housing.csv", header=TRUE)
head(housing_data)
Using the dplyr package within Tidyverse, we can use the mutate()
function and if_else().
new_housing = housing_data %>%
mutate(mainroad_binary = if_else(mainroad == "yes", 1, 0))
new_housing = new_housing %>%
mutate(guestroom_binary = if_else(guestroom == "yes", 1, 0))
new_housing = new_housing %>%
mutate(basement_binary = if_else(basement == "yes", 1, 0))
new_housing = new_housing %>%
mutate(water_binary = if_else(hotwaterheating == "yes", 1, 0))
head(new_housing)
This technique is useful especially for preparing data prior to
building and fitting binary response models for classification tasks. In
particular, logistic regression.
One-Hot Encoding: The Multiclass Case
Now, we can apply this same structure for multiclass categorical
variables by embedding the if_else() function recursively in the current
“else” condition. Let’s use the housing.csv dataset as an example but
for the furnishings categories.
housing_furn = new_housing %>%
mutate(furnishing_multinom = if_else(furnishingstatus == "furnished", 1, if_else(furnishingstatus == "semi-furnished", 2, 3)))
head(housing_furn)
Note that we used another if_else() inside of the first if_else()
statement to handle the 3 categories within the furnishingstatus
variable. This allows for multiple categorical values to be “chained”
and easily transformed.
This technique is good for preparing data prior to multinomial
(equivalently known as multiclass) model fitting such as for
classification with Feed-Forward Neural Networks (FNN’s) for
example.
There are many ways to transform data using dplyr and other useful R
packages, and in fact, we’ll see similar logic return when we learn
about SQL and databases in the near future. But what happens when our
data that we inspect doesn’t meet our expectations (i.e. what if we have
skewed data that isn’t quite normally distributed?). This may throw a
wrench when performing statistical procedures such as hypothesis testing
and fitting generalized linear models (glm’s). So let’s see our options
here.
Functional Data Transformations
We have seen numerous techniques on data transformations on
dataframes by manipulating rows and columns. Now we consider functional
transformations of continuous data. Often times, our data can be
transformed using elementary mathematical functions. Note that
functional transformations won’t always be perfect, and may even not
always be successful. Although it is a nice skill to have in your
repertoire.
We usually use these types of transformations to achieve one or more
of the following: - reduce skewness in distributions - stabilize
variance - make relationships more linear, where errors follow a normal
distribution (we’ll see this play a role in the future when we cover
regression analysis).
Consider the following simulated data.
set.seed(42)
x = runif(30, min=0, max=10)
y = x^2 + 1
y = y +rnorm(length(x), mean = 0, sd = 5)
sim_data = data.frame(x, y)
plot(sim_data$x, sim_data$y, col="blue", main = "Simulated Data",
xlab= "x", ylab = "y", pch = 19)

Clearly, this is a random sample of points from a quadratic function,
mainly x^2 + 1. But what happens if we perform a log transformation on
this data?
y = log10(y)
sim_data = data.frame(x, y)
plot(sim_data$x, sim_data$y, col="blue", main = "Simulated Data",
xlab= "x", ylab = "y", pch = 19)

And now we have the data looking more approximately linear. This is
the beauty of functional transformations, as they can significantly help
in making your nonlinear data more linear.
Here are a few transformations that may be helpful for different
types of skewed data you may encounter:
Moderate skew: - sqrt(x) for positively (right) skewed data -
sqrt(max(x+1) - x) for negatively (left) skewed data
Strong skew: - log10(x) for positively skewed data - log10(max(x+1) -
x) for negatively skewed data
Inverse skew: - 1/x for positively skewed data - 1/(max(x+1) - x) for
negatively skewed data
Sometimes, we’ll use log(x) to handle linearity and
heteroscedasticity (high, non-constant variation in your data; more on
this later when we do regression analysis).
As an example, we’ll be looking at the USJudgeRatings dataset.
# install.packages("moments")
library(tidyverse)
library(moments)
data("USJudgeRatings")
df = USJudgeRatings
head(df)
In this example, let’s consider the CONT variable, the number of
contacts between the judge and lawyers. Let’s check the skewness of this
variable first using skewness().
cont_skew = skewness(df$CONT, na.rm = TRUE)
cont_skew
[1] 1.085972
Now let’s check the density of the distribution. We’ll be doing more
of this when we go over data visualizations and EDA next week.
x = USJudgeRatings$CONT
data = data.frame(CONT = x)
mu = mean(x)
sigma = sd(x)
ggplot(data, aes(CONT)) +
geom_density(fill = "lightgray", alpha = 0.5) + stat_function(fun = dnorm, args = list(mean = mu, sd = sigma), color = "red", size=1) +
labs(title = "Density of CONT with Normal Curve", x = "CONT", y = "Density") +
theme_minimal()

Seems like we’ll need to use a log10 transformation.
df$CONT = log10(df$CONT)
x = df$CONT
data = data.frame(CONT = x)
mu = mean(x)
sigma = sd(x)
ggplot(data, aes(CONT)) +
geom_density(fill = "lightgray", alpha = 0.5) + stat_function(fun = dnorm, args = list(mean = mu, sd = sigma), color = "red", size=1) +
labs(title = "Density of CONT with Normal Curve", x = "CONT", y = "Density") +
theme_minimal()

And we see that our data has been transformed to fit better with the
normal distribution (the superimposed red line). While it’s not perfect,
it’s a massive improvement if you compare this density with the original
density of CONT.
LS0tDQp0aXRsZTogIlIgTm90ZWJvb2siDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQojIyMgKipEYXRhIFRyYW5zZm9ybWF0aW9uIGFuZCBNYW5pcHVsYXRpb24qKg0KDQpTbyBmYXIgaW4gdGhpcyBjb3Vyc2UsIHdlIGhhdmUgY292ZXJlZCB0aGUgYmFzaWNzIGluIHdvcmtpbmcgd2l0aCBkYXRhc2V0cyBpbiBSLCBib3RoIGJ1aWx0LWluIGFuZCBjdXN0b20gaW1wb3J0cy4gV2UgaGF2ZSBhbHNvIGxlYXJuZWQgYWJvdXQgc29tZSBmdW5kYW1lbnRhbCBhbGdvcml0aG1zIHRoYXQgcGxheSBhIHJvbGUgaW4gZGF0YSBzY2llbmNlIGFwcGxpY2F0aW9ucy4gVGhpcyB3ZWVrLCB3ZSB3aWxsIGZvY3VzIG9uIGhvbmluZyB5b3VyIHNraWxscyBpbiBtYW5pcHVsYXRpbmcgZGF0YSBhbmQgdHJhbnNmb3JtaW5nIGl0IHRvIGZpdCB5b3VyIG5lZWRzLg0KDQpFbXBoYXNpcyB3aWxsIGJlIHBsYWNlZCBvbiB1c2luZyBkcGx5ciBmcm9tIHRoZSBUaWR5dmVyc2UgcGFja2FnZS5UaGUgZmlyc3QgZmV3IGRhdGEgdHJhbnNmb3JtYXRpb24gZXhhbXBsZXMgd2lsbCBiZSBkb25lIHVzaW5nIHRoZSBueWNmbGlnaHRzMTMgYnVpbHQtaW4gZGF0YXNldC4NCg0KYGBge3J9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkobnljZmxpZ2h0czEzKQ0KYGBgDQoNCiMjIyBGaWx0ZXJpbmcNCg0KV2UgY2FuIGZpbHRlciBkYXRhIG9ic2VydmF0aW9ucyB1c2luZyB0aGUgZHBseXIgcGFja2FnZSdzIGZpbHRlcigpIGZ1bmN0aW9uLiBVc3VhbGx5IHdlIGZpbHRlciBieSBhIGNyaXRlcmlvbiB0aGF0IHdlIGNhbiBjdXN0b21pemUgYW5kIGRlZmluZS4gDQoNCmBgYHtyfQ0KamFuMSA9IGZpbHRlcihmbGlnaHRzLCBtb250aD09MSwgZGF5PT0xKQ0KamFuMQ0KYGBgDQoNClRoZSBhYm92ZSBjb2RlIGZpbHRlcnMgdGhlIGRhdGEgc3VjaCB0aGF0IGl0IHJldHVybnMgYWxsIGZsaWdodHMgdGhhdCB3ZXJlIG9uIEphbnVhcnkgMS4gTm93IGhvdyBhYm91dCBhbGwgZmxpZ2h0cyB0aGF0IG9jY3VycmVkIGluIE5vdmVtYmVyIG9yIERlY2VtYmVyPw0KDQpgYGB7cn0NCm5vdl9kZWMgPSBmaWx0ZXIoZmxpZ2h0cywgbW9udGggJWluJSBjKDExLCAxMikpDQpub3ZfZGVjDQpgYGANCg0KRmlsdGVyaW5nIHdvcmtzIGZhc3RlciB0aGFuIGRyb3BwaW5nIHJvd3MgbWFudWFsbHkgYnkgb25seSByZXR1cm5pbmcgdGhlIG9ic2VydmF0aW9ucyB0aGF0IHNhdGlzZnkgeW91ciBjb25kaXRpb25hbCByZXF1aXJlbWVudC4gTm93IHRvIGZpbmQgYWxsIGZsaWdodHMgdGhhdCB3ZXJlIG5vdCBkZWxheWVkIGJ5IG5vIG1vcmUgdGhhbiB0d28gaG91cnMgKGRlbGF5IGlzIG1lYXN1cmVkIGluIG1pbnV0ZXMgaGVyZSkuLi4NCg0KYGBge3J9DQpub19kZWxheSA9IGZpbHRlcihmbGlnaHRzLCBhcnJfZGVsYXkgPD0gMTIwLCBkZXBfZGVsYXkgPD0gMTIwKQ0Kbm9fZGVsYXkNCmBgYA0KDQojIyMgQXJyYW5naW5nDQoNCmBgYHtyfQ0KZGVzY2VuZGluZ19kZWxheSA9IGFycmFuZ2UoZmxpZ2h0cywgZGVzYyhkZXBfZGVsYXkpKQ0KZGVzY2VuZGluZ19kZWxheQ0KYGBgDQoNCk5vdyBsZXQncyB0cnkgYXNjZW5kaW5nIG9yZGVyIGJ5IGFycml2YWwgZGVsYXkgdGltZS4gDQoNCmBgYHtyfQ0KYXJyX2FzY2VuZCA9IGFycmFuZ2UoZmxpZ2h0cywgLWRlc2MoYXJyX2RlbGF5KSkNCmFycl9hc2NlbmQNCmBgYA0KDQojIyMgRGF0YSBSZXRyaWV2YWwNCg0KV2UgY2FuIHJldHJpZXZlIGNlcnRhaW4gY29sdW1ucyBvZiBkYXRhIHdpdGggdGhlIHNlbGVjdCgpIGZ1bmN0aW9uIGZyb20gZHBseXIuIA0KDQpgYGB7cn0NCnNlbGVjdChmbGlnaHRzLCBkZXBfdGltZSwgc2NoZWRfZGVwX3RpbWUpDQpgYGANCg0KTGV0J3MgdHJ5IHJldHJpZXZpbmcgY29sdW1ucyBmZWF0dXJlcyB0aGF0IGRvbid0IGNvbnRhaW4gdGhlIHN0cmluZyAidGltZSIgaW4gdGhlIG5hbWUuIFlvdSBjYW4gYWxzbyBzaW1pbGFybHkgc3dpdGNoIHRoaXMgdXAgd2l0aCBzdGFydHNfd2l0aCgpIGFuZCBlbmRzX3dpdGgoKS4gDQoNCmBgYHtyfQ0Kc2VsZWN0KGZsaWdodHMsICFjb250YWlucygidGltZSIpKQ0KYGBgDQoNCldlIGNhbiBtb3ZlIGNlcnRhaW4gdmFyaWFibGVzIHRvIHRoZSBmcm9udCB3aXRoIHRoZSBzZWxlY3QoKSBmdW5jdGlvbiBhbmQgdXNpbmcgdGhlIGV2ZXJ5dGhpbmcoKSBmdW5jdGlvbiBsYXN0Lg0KDQpgYGB7cn0NCnNlbGVjdChmbGlnaHRzLCB0aW1lX2hvdXIsIGFpcl90aW1lLCBldmVyeXRoaW5nKCkpDQpgYGANCg0KIyMjIFZhcmlhYmxlIENyZWF0aW9uIGFuZCBNb2RpZmljYXRpb24NCg0KV2UgY2FuIGNyZWF0ZSBuZXcgdmFyaWFibGVzIGFuZCBtb2RpZnkgdmFsdWVzIGluIHRoZW0gdXNpbmcgbXV0YXRlKCkuIFlvdSBjYW4gdXNlIGNvbmRpdGlvbmFsIHN0YXRlbWVudHMgYW5kIGV2ZW4gdXNlIG90aGVyIGJ1aWx0LWluIGZ1bmN0aW9ucyB0byBzcGVjaWZ5IGhvdyB5b3Ugd2FudCB0byB0cmFuc2Zvcm0gdmFyaWFibGVzLiBXZSd2ZSBzZWVuIHRoaXMgYmVmb3JlIHdoZW4gcGVyZm9ybWluZyBkYXRhIGNsZWFuaW5nIGxhc3Qgd2VlaywgYW5kIHdlJ2xsIHNlZSBpdCBhcHBsaWVkIGxhdGVyIGFnYWluLg0KDQpMZXQncyBjcmVhdGUgYSBuZXcgdmFyaWFibGUgaW4gbnljZmxpZ2h0czEzIHdoZXJlIHdlIGNhbGN1bGF0ZSB0aGUgc3BlZWQgb2YgZWFjaCBvZiB0aGUgZGVsYXllZCBmbGlnaHRzLiBXZSdsbCBuZWVkIHRvIHVzZSBzZWxlY3QoKSBmaXJzdC4NCg0KYGBge3J9DQpmbGlnaHRzX3NtbCA9IHNlbGVjdChmbGlnaHRzLA0KICAgICAgICAgICAgICAgICAgICAgeWVhcjpkYXksDQogICAgICAgICAgICAgICAgICAgICBlbmRzX3dpdGgoImRlbGF5IiksDQogICAgICAgICAgICAgICAgICAgICBkaXN0YW5jZSwNCiAgICAgICAgICAgICAgICAgICAgIGFpcl90aW1lKQ0KbXV0YXRlKGZsaWdodHNfc21sLCBzcGVlZCA9IGRpc3RhbmNlIC8gYWlyX3RpbWUgKiA2MCkNCmBgYA0KDQp0cmFuc211dGUoKSBjYW4gYmUgdXNlZCB0byBvbWl0IGFsbCB2YXJpYWJsZXMgZXhjZXB0IHRoZSBvbmVzIHlvdSd2ZSBjcmVhdGVkIGluIHRoZSB0aWJibGUvZGF0YWZyYW1lLg0KDQpgYGB7cn0NCnRyYW5zbXV0ZShmbGlnaHRzX3NtbCwgc3BlZWQgPSBkaXN0YW5jZS8gYWlyX3RpbWUgKiA2MCkNCmBgYA0KDQojIyMgR3JvdXBpbmcgYW5kIFN1bW1hcml6aW5nDQoNCmdyb3VwX2J5KCkgY2FuIGJlIHVzZWQgdG8gZ3JvdXAgeW91ciBkYXRhIGJ5IGNlcnRhaW4gY29sdW1ucy4gbmEucm0gaXMgdGhlIGRlZmF1bHQgUiBwYXJhbWV0ZXIgdGhhdCBvbWl0cyBOQSB2YWx1ZXMgaWYgdGhlcmUgYXJlIGFueS4gV2UgY2FuIHVzZSB0aGlzIHBhcmFtZXRlciAoanVzdCBsaWtlIGxhc3Qgd2Vlaykgc2luY2UgYWdncmVnYXRlIGZ1bmN0aW9ucyBidWlsdCBpbnRvIFIgd2lsbCByZXR1cm4gTkEgdmFsdWVzIGFzIG91dHB1dCBpZiB0aGUgaW5wdXQgaXMgYWxzbyBOQS4gVGh1cyBuYS5ybSBpcyBhIHVzZWZ1bCBwYXJhbWV0ZXIgd2hlbiBjbGVhbmluZyBhbmQgdHJhbnNmb3JtaW5nIHlvdXIgZGF0YS4NCg0KYGBge3J9DQpieV9kYXkgPSBncm91cF9ieShmbGlnaHRzLCB5ZWFyLCBtb250aCwgZGF5KQ0Kc3VtbWFyaXplKGJ5X2RheSwgZGVsYXkgPSBtZWFuKGRlcF9kZWxheSwgbmEucm0gPSBUUlVFKSkNCmBgYA0KDQpTbyB0aGlzIGlzIHdoYXQgaGFwcGVucyB3aGVuIHdlIHVzZSBtZWFuKCkgYW5kIG5vIG5hLnJtIHBhcmFtZXRlci4NCg0KYGBge3J9DQpieV9kYXkgPSBncm91cF9ieShmbGlnaHRzLCB5ZWFyLCBtb250aCwgZGF5KQ0Kc3VtbWFyaXplKGJ5X2RheSwgZGVsYXkgPSBtZWFuKGRlcF9kZWxheSkpDQpgYGANCg0KIyMjIE9uZS1Ib3QgRW5jb2Rpbmc6IFRoZSBCaW5hcnkgU2lkZQ0KDQpMYXN0IHdlZWssIHdlIGRpZG4ndCB0b3VjaCB0aGUgY2F0ZWdvcmljYWwgdmFyaWFibGVzIGZvciBhIHJlYXNvbi4gVGhpcyB3ZWVrLCB3ZSB3aWxsIHVzZSBvdXIga25vd2xlZGdlIG9mIGRwbHlyIHRvIHRyYW5zZm9ybSBvdXIgZGF0YSBieSBlbXBsb3lpbmcgb25lLWhvdCBlbmNvZGluZy4gVGhpcyBpZGVhIGZvY3VzZXMgb24gdGhlIHByZW1pc2Ugb2YgY3JlYXRpbmcgYSBuZXcgb2JqZWN0IGFuZCBhc3NpZ25pbmcgY29ycmVzcG9uZGluZyBudW1lcmljYWwgdmFsdWVzIHRvIHRoZSBvcmlnaW5hbCB2YWx1ZXMuIFRoaXMgaXMgY29tbW9ubHkgZG9uZSBpbiBwcmVwYXJhdGlvbiBwcmlvciB0byBkYXRhIGFuYWx5c2lzIGFuZCBtYWNoaW5lIGxlYXJuaW5nIG1vZGVsIGltcGxlbWVudGF0aW9uLiBIZXJlLCB3ZSdsbCBjb25zaWRlciBhIGNhdGVnb3JpY2FsIHZhcmlhYmxlIHdpdGggdHdvIGNsYXNzZXMsIHdoaWNoIGlzIGEgYmluYXJ5IHZhcmlhYmxlLg0KDQpMZXQncyB0YWtlIGEgbG9vayBhdCB0aGUgaG91c2luZy5jc3YgZGF0YXNldCAocHJvdmlkZWQgaW4gQ2FudmFzKS4NCg0KYGBge3J9DQpsaWJyYXJ5KHJlYWRyKQ0KbGlicmFyeSh0aWR5dmVyc2UpDQoNCmhvdXNpbmdfZGF0YSA9IHJlYWQuY3N2KCJDOi9Vc2Vycy9jb3J5Zy9Eb3dubG9hZHMvSG91c2luZy5jc3YiLCBoZWFkZXI9VFJVRSkNCg0KaGVhZChob3VzaW5nX2RhdGEpDQpgYGANCg0KVXNpbmcgdGhlIGRwbHlyIHBhY2thZ2Ugd2l0aGluIFRpZHl2ZXJzZSwgd2UgY2FuIHVzZSB0aGUgbXV0YXRlKCkgZnVuY3Rpb24gYW5kIGlmX2Vsc2UoKS4NCg0KYGBge3J9DQpuZXdfaG91c2luZyA9IGhvdXNpbmdfZGF0YSAlPiUNCiAgbXV0YXRlKG1haW5yb2FkX2JpbmFyeSA9IGlmX2Vsc2UobWFpbnJvYWQgPT0gInllcyIsIDEsIDApKQ0KDQpuZXdfaG91c2luZyA9IG5ld19ob3VzaW5nICU+JQ0KICBtdXRhdGUoZ3Vlc3Ryb29tX2JpbmFyeSA9IGlmX2Vsc2UoZ3Vlc3Ryb29tID09ICJ5ZXMiLCAxLCAwKSkNCg0KbmV3X2hvdXNpbmcgPSBuZXdfaG91c2luZyAlPiUNCiAgbXV0YXRlKGJhc2VtZW50X2JpbmFyeSA9IGlmX2Vsc2UoYmFzZW1lbnQgPT0gInllcyIsIDEsIDApKQ0KDQpuZXdfaG91c2luZyA9IG5ld19ob3VzaW5nICU+JQ0KICBtdXRhdGUod2F0ZXJfYmluYXJ5ID0gaWZfZWxzZShob3R3YXRlcmhlYXRpbmcgPT0gInllcyIsIDEsIDApKQ0KDQpoZWFkKG5ld19ob3VzaW5nKQ0KYGBgDQoNClRoaXMgdGVjaG5pcXVlIGlzIHVzZWZ1bCBlc3BlY2lhbGx5IGZvciBwcmVwYXJpbmcgZGF0YSBwcmlvciB0byBidWlsZGluZyBhbmQgZml0dGluZyBiaW5hcnkgcmVzcG9uc2UgbW9kZWxzIGZvciBjbGFzc2lmaWNhdGlvbiB0YXNrcy4gSW4gcGFydGljdWxhciwgbG9naXN0aWMgcmVncmVzc2lvbi4NCg0KIyMjIE9uZS1Ib3QgRW5jb2Rpbmc6IFRoZSBNdWx0aWNsYXNzIENhc2UNCg0KTm93LCB3ZSBjYW4gYXBwbHkgdGhpcyBzYW1lIHN0cnVjdHVyZSBmb3IgbXVsdGljbGFzcyBjYXRlZ29yaWNhbCB2YXJpYWJsZXMgYnkgZW1iZWRkaW5nIHRoZSBpZl9lbHNlKCkgZnVuY3Rpb24gcmVjdXJzaXZlbHkgaW4gdGhlIGN1cnJlbnQgImVsc2UiIGNvbmRpdGlvbi4gTGV0J3MgdXNlIHRoZSBob3VzaW5nLmNzdiBkYXRhc2V0IGFzIGFuIGV4YW1wbGUgYnV0IGZvciB0aGUgZnVybmlzaGluZ3MgY2F0ZWdvcmllcy4gDQoNCmBgYHtyfQ0KaG91c2luZ19mdXJuID0gbmV3X2hvdXNpbmcgJT4lDQogIG11dGF0ZShmdXJuaXNoaW5nX211bHRpbm9tID0gaWZfZWxzZShmdXJuaXNoaW5nc3RhdHVzID09ICJmdXJuaXNoZWQiLCAxLCBpZl9lbHNlKGZ1cm5pc2hpbmdzdGF0dXMgPT0gInNlbWktZnVybmlzaGVkIiwgMiwgMykpKQ0KDQpoZWFkKGhvdXNpbmdfZnVybikNCmBgYA0KDQpOb3RlIHRoYXQgd2UgdXNlZCBhbm90aGVyIGlmX2Vsc2UoKSBpbnNpZGUgb2YgdGhlIGZpcnN0IGlmX2Vsc2UoKSBzdGF0ZW1lbnQgdG8gaGFuZGxlIHRoZSAzIGNhdGVnb3JpZXMgd2l0aGluIHRoZSBmdXJuaXNoaW5nc3RhdHVzIHZhcmlhYmxlLiBUaGlzIGFsbG93cyBmb3IgbXVsdGlwbGUgY2F0ZWdvcmljYWwgdmFsdWVzIHRvIGJlICJjaGFpbmVkIiBhbmQgZWFzaWx5IHRyYW5zZm9ybWVkLg0KDQpUaGlzIHRlY2huaXF1ZSBpcyBnb29kIGZvciBwcmVwYXJpbmcgZGF0YSBwcmlvciB0byBtdWx0aW5vbWlhbCAoZXF1aXZhbGVudGx5IGtub3duIGFzIG11bHRpY2xhc3MpIG1vZGVsIGZpdHRpbmcgc3VjaCBhcyBmb3IgY2xhc3NpZmljYXRpb24gd2l0aCBGZWVkLUZvcndhcmQgTmV1cmFsIE5ldHdvcmtzIChGTk4ncykgZm9yIGV4YW1wbGUuDQoNClRoZXJlIGFyZSBtYW55IHdheXMgdG8gdHJhbnNmb3JtIGRhdGEgdXNpbmcgZHBseXIgYW5kIG90aGVyIHVzZWZ1bCBSIHBhY2thZ2VzLCBhbmQgaW4gZmFjdCwgd2UnbGwgc2VlIHNpbWlsYXIgbG9naWMgcmV0dXJuIHdoZW4gd2UgbGVhcm4gYWJvdXQgU1FMIGFuZCBkYXRhYmFzZXMgaW4gdGhlIG5lYXIgZnV0dXJlLiBCdXQgd2hhdCBoYXBwZW5zIHdoZW4gb3VyIGRhdGEgdGhhdCB3ZSBpbnNwZWN0IGRvZXNuJ3QgbWVldCBvdXIgZXhwZWN0YXRpb25zIChpLmUuIHdoYXQgaWYgd2UgaGF2ZSBza2V3ZWQgZGF0YSB0aGF0IGlzbid0IHF1aXRlIG5vcm1hbGx5IGRpc3RyaWJ1dGVkPykuIFRoaXMgbWF5IHRocm93IGEgd3JlbmNoIHdoZW4gcGVyZm9ybWluZyBzdGF0aXN0aWNhbCBwcm9jZWR1cmVzIHN1Y2ggYXMgaHlwb3RoZXNpcyB0ZXN0aW5nIGFuZCBmaXR0aW5nIGdlbmVyYWxpemVkIGxpbmVhciBtb2RlbHMgKGdsbSdzKS4gU28gbGV0J3Mgc2VlIG91ciBvcHRpb25zIGhlcmUuDQoNCiMjIyBGdW5jdGlvbmFsIERhdGEgVHJhbnNmb3JtYXRpb25zDQoNCldlIGhhdmUgc2VlbiBudW1lcm91cyB0ZWNobmlxdWVzIG9uIGRhdGEgdHJhbnNmb3JtYXRpb25zIG9uIGRhdGFmcmFtZXMgYnkgbWFuaXB1bGF0aW5nIHJvd3MgYW5kIGNvbHVtbnMuIE5vdyB3ZSBjb25zaWRlciBmdW5jdGlvbmFsIHRyYW5zZm9ybWF0aW9ucyBvZiBjb250aW51b3VzIGRhdGEuIE9mdGVuIHRpbWVzLCBvdXIgZGF0YSBjYW4gYmUgdHJhbnNmb3JtZWQgdXNpbmcgZWxlbWVudGFyeSBtYXRoZW1hdGljYWwgZnVuY3Rpb25zLiBOb3RlIHRoYXQgZnVuY3Rpb25hbCB0cmFuc2Zvcm1hdGlvbnMgd29uJ3QgYWx3YXlzIGJlIHBlcmZlY3QsIGFuZCBtYXkgZXZlbiBub3QgYWx3YXlzIGJlIHN1Y2Nlc3NmdWwuIEFsdGhvdWdoIGl0IGlzIGEgbmljZSBza2lsbCB0byBoYXZlIGluIHlvdXIgcmVwZXJ0b2lyZS4gDQoNCldlIHVzdWFsbHkgdXNlIHRoZXNlIHR5cGVzIG9mIHRyYW5zZm9ybWF0aW9ucyB0byBhY2hpZXZlIG9uZSBvciBtb3JlIG9mIHRoZSBmb2xsb3dpbmc6DQotIHJlZHVjZSBza2V3bmVzcyBpbiBkaXN0cmlidXRpb25zDQotIHN0YWJpbGl6ZSB2YXJpYW5jZQ0KLSBtYWtlIHJlbGF0aW9uc2hpcHMgbW9yZSBsaW5lYXIsIHdoZXJlIGVycm9ycyBmb2xsb3cgYSBub3JtYWwgZGlzdHJpYnV0aW9uICh3ZSdsbCBzZWUgdGhpcyBwbGF5IGEgcm9sZSBpbiB0aGUgZnV0dXJlIHdoZW4gd2UgY292ZXIgcmVncmVzc2lvbiBhbmFseXNpcykuDQoNCkNvbnNpZGVyIHRoZSBmb2xsb3dpbmcgc2ltdWxhdGVkIGRhdGEuIA0KDQoNCmBgYHtyfQ0Kc2V0LnNlZWQoNDIpDQoNCnggPSBydW5pZigzMCwgbWluPTAsIG1heD0xMCkNCnkgPSB4XjIgKyAxDQp5ID0geSArIHJub3JtKGxlbmd0aCh4KSwgbWVhbiA9IDAsIHNkID0gNSkNCg0Kc2ltX2RhdGEgPSBkYXRhLmZyYW1lKHgsIHkpDQoNCnBsb3Qoc2ltX2RhdGEkeCwgc2ltX2RhdGEkeSwgY29sPSJibHVlIiwgbWFpbiA9ICJTaW11bGF0ZWQgRGF0YSIsIA0KICAgICB4bGFiPSAieCIsIHlsYWIgPSAieSIsIHBjaCA9IDE5KQ0KYGBgDQoNCkNsZWFybHksIHRoaXMgaXMgYSByYW5kb20gc2FtcGxlIG9mIHBvaW50cyBmcm9tIGEgcXVhZHJhdGljIGZ1bmN0aW9uLCBtYWlubHkgeF4yICsgMS4gQnV0IHdoYXQgaGFwcGVucyBpZiB3ZSBwZXJmb3JtIGEgbG9nIHRyYW5zZm9ybWF0aW9uIG9uIHRoaXMgZGF0YT8NCg0KYGBge3J9DQp5ID0gbG9nMTAoeSkNCg0Kc2ltX2RhdGEgPSBkYXRhLmZyYW1lKHgsIHkpDQoNCnBsb3Qoc2ltX2RhdGEkeCwgc2ltX2RhdGEkeSwgY29sPSJibHVlIiwgbWFpbiA9ICJTaW11bGF0ZWQgRGF0YSIsIA0KICAgICB4bGFiPSAieCIsIHlsYWIgPSAieSIsIHBjaCA9IDE5KQ0KYGBgDQpBbmQgbm93IHdlIGhhdmUgdGhlIGRhdGEgbG9va2luZyBtb3JlIGFwcHJveGltYXRlbHkgbGluZWFyLiBUaGlzIGlzIHRoZSBiZWF1dHkgb2YgZnVuY3Rpb25hbCB0cmFuc2Zvcm1hdGlvbnMsIGFzIHRoZXkgY2FuIHNpZ25pZmljYW50bHkgaGVscCBpbiBtYWtpbmcgeW91ciBub25saW5lYXIgZGF0YSBtb3JlIGxpbmVhci4NCg0KSGVyZSBhcmUgYSBmZXcgdHJhbnNmb3JtYXRpb25zIHRoYXQgbWF5IGJlIGhlbHBmdWwgZm9yIGRpZmZlcmVudCB0eXBlcyBvZiBza2V3ZWQgZGF0YSB5b3UgbWF5IGVuY291bnRlcjoNCg0KTW9kZXJhdGUgc2tldzoNCi0gc3FydCh4KSBmb3IgcG9zaXRpdmVseSAocmlnaHQpIHNrZXdlZCBkYXRhDQotIHNxcnQobWF4KHgrMSkgLSB4KSBmb3IgbmVnYXRpdmVseSAobGVmdCkgc2tld2VkIGRhdGENCg0KU3Ryb25nIHNrZXc6DQotIGxvZzEwKHgpIGZvciBwb3NpdGl2ZWx5IHNrZXdlZCBkYXRhDQotIGxvZzEwKG1heCh4KzEpIC0geCkgZm9yIG5lZ2F0aXZlbHkgc2tld2VkIGRhdGENCg0KSW52ZXJzZSBza2V3Og0KLSAxL3ggZm9yIHBvc2l0aXZlbHkgc2tld2VkIGRhdGENCi0gMS8obWF4KHgrMSkgLSB4KSBmb3IgbmVnYXRpdmVseSBza2V3ZWQgZGF0YQ0KDQpTb21ldGltZXMsIHdlJ2xsIHVzZSBsb2coeCkgdG8gaGFuZGxlIGxpbmVhcml0eSBhbmQgaGV0ZXJvc2NlZGFzdGljaXR5IChoaWdoLCBub24tY29uc3RhbnQgdmFyaWF0aW9uIGluIHlvdXIgZGF0YTsgbW9yZSBvbiB0aGlzIGxhdGVyIHdoZW4gd2UgZG8gcmVncmVzc2lvbiBhbmFseXNpcykuDQoNCkFzIGFuIGV4YW1wbGUsIHdlJ2xsIGJlIGxvb2tpbmcgYXQgdGhlIFVTSnVkZ2VSYXRpbmdzIGRhdGFzZXQuDQoNCmBgYHtyfQ0KIyBpbnN0YWxsLnBhY2thZ2VzKCJtb21lbnRzIikNCmxpYnJhcnkodGlkeXZlcnNlKQ0KbGlicmFyeShtb21lbnRzKQ0KYGBgDQoNCmBgYHtyfQ0KZGF0YSgiVVNKdWRnZVJhdGluZ3MiKQ0KZGYgPSBVU0p1ZGdlUmF0aW5ncw0KaGVhZChkZikNCmBgYA0KDQpJbiB0aGlzIGV4YW1wbGUsIGxldCdzIGNvbnNpZGVyIHRoZSBDT05UIHZhcmlhYmxlLCB0aGUgbnVtYmVyIG9mIGNvbnRhY3RzIGJldHdlZW4gdGhlIGp1ZGdlIGFuZCBsYXd5ZXJzLiBMZXQncyBjaGVjayB0aGUgc2tld25lc3Mgb2YgdGhpcyB2YXJpYWJsZSBmaXJzdCB1c2luZyBza2V3bmVzcygpLg0KDQpgYGB7cn0NCmNvbnRfc2tldyA9IHNrZXduZXNzKGRmJENPTlQsIG5hLnJtID0gVFJVRSkNCmNvbnRfc2tldw0KYGBgDQoNCk5vdyBsZXQncyBjaGVjayB0aGUgZGVuc2l0eSBvZiB0aGUgZGlzdHJpYnV0aW9uLiBXZSdsbCBiZSBkb2luZyBtb3JlIG9mIHRoaXMgd2hlbiB3ZSBnbyBvdmVyIGRhdGEgdmlzdWFsaXphdGlvbnMgYW5kIEVEQSBuZXh0IHdlZWsuDQoNCmBgYHtyfQ0KeCA9IFVTSnVkZ2VSYXRpbmdzJENPTlQNCmRhdGEgPSBkYXRhLmZyYW1lKENPTlQgPSB4KQ0KbXUgPSBtZWFuKHgpDQpzaWdtYSA9IHNkKHgpDQpnZ3Bsb3QoZGF0YSwgYWVzKENPTlQpKSArDQogIGdlb21fZGVuc2l0eShmaWxsID0gImxpZ2h0Z3JheSIsIGFscGhhID0gMC41KSArIHN0YXRfZnVuY3Rpb24oZnVuID0gZG5vcm0sIGFyZ3MgPSBsaXN0KG1lYW4gPSBtdSwgc2QgPSBzaWdtYSksIGNvbG9yID0gInJlZCIsIHNpemU9MSkgKw0KICBsYWJzKHRpdGxlID0gIkRlbnNpdHkgb2YgQ09OVCB3aXRoIE5vcm1hbCBDdXJ2ZSIsIHggPSAiQ09OVCIsIHkgPSAiRGVuc2l0eSIpICsNCiAgdGhlbWVfbWluaW1hbCgpDQpgYGANClNlZW1zIGxpa2Ugd2UnbGwgbmVlZCB0byB1c2UgYSBsb2cxMCB0cmFuc2Zvcm1hdGlvbi4gDQoNCmBgYHtyfQ0KZGYkQ09OVCA9IGxvZzEwKGRmJENPTlQpDQp4ID0gZGYkQ09OVA0KZGF0YSA9IGRhdGEuZnJhbWUoQ09OVCA9IHgpDQptdSA9IG1lYW4oeCkNCnNpZ21hID0gc2QoeCkNCmdncGxvdChkYXRhLCBhZXMoQ09OVCkpICsNCiAgZ2VvbV9kZW5zaXR5KGZpbGwgPSAibGlnaHRncmF5IiwgYWxwaGEgPSAwLjUpICsgc3RhdF9mdW5jdGlvbihmdW4gPSBkbm9ybSwgYXJncyA9IGxpc3QobWVhbiA9IG11LCBzZCA9IHNpZ21hKSwgY29sb3IgPSAicmVkIiwgc2l6ZT0xKSArDQogIGxhYnModGl0bGUgPSAiRGVuc2l0eSBvZiBDT05UIHdpdGggTm9ybWFsIEN1cnZlIiwgeCA9ICJDT05UIiwgeSA9ICJEZW5zaXR5IikgKw0KICB0aGVtZV9taW5pbWFsKCkNCmBgYA0KQW5kIHdlIHNlZSB0aGF0IG91ciBkYXRhIGhhcyBiZWVuIHRyYW5zZm9ybWVkIHRvIGZpdCBiZXR0ZXIgd2l0aCB0aGUgbm9ybWFsIGRpc3RyaWJ1dGlvbiAodGhlIHN1cGVyaW1wb3NlZCByZWQgbGluZSkuIFdoaWxlIGl0J3Mgbm90IHBlcmZlY3QsIGl0J3MgYSBtYXNzaXZlIGltcHJvdmVtZW50IGlmIHlvdSBjb21wYXJlIHRoaXMgZGVuc2l0eSB3aXRoIHRoZSBvcmlnaW5hbCBkZW5zaXR5IG9mIENPTlQuDQoNCjxicj4=